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ABSTRACT 


Forecast  hurricane  tracks  using  a  multi-model  ensemble  that  is  comprised 
by  linearly  combining  the  individual  model  forecasts  have  greatly  reduced  the 
average  forecast  errors  when  compared  to  individual  dynamic  model  forecast 
errors.  In  this  experiment,  a  complex  adaptive  system,  the  Tropical  Agent 
Forecaster  (TAF),  is  created  to  fashion  a  ‘smart’  ensemble  forecast.  The  TAF 
uses  autonomous  agents  to  assess  the  historical  performance  of  individual 
models  and  model  combinations,  called  predictors,  and  weights  them  based  on 
their  average  error  compared  to  the  best  track  information.  Agents  continually 
monitor  themselves  and  determine  which  predictors,  for  the  life  of  the  storm, 
perform  the  best  in  terms  of  the  distance  between  forecast  and  best-track 
positions.  A  TAF  forecast  is  developed  using  a  linear  combination  of  the  highest 
weighted  predictors.  When  applied  to  the  2004  Atlantic  hurricane  season,  the 
TAF  system  with  a  requirement  to  contain  a  minimum  of  three  predictors, 
consistently  outperformed,  although  not  statistically  significant,  the  CONU 
forecast  at  72  and  96  hours  for  a  homogeneous  data  set.  At  120  hours,  the  TAF 
system  significantly  decreased  the  average  forecast  errors  when  compared  to 
the  CONU. 
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I.  INTRODUCTION 


A.  OBJECTIVE 

In  the  Chief  of  Naval  Operations  vision,  “Sea  Power  21:  Projecting 
Decisive  Joint  Capabilities,”  Admiral  Clark  lays  out  the  three  fundamental 
concepts  required  for  achieving  this  vision:  Sea  Strike,  Sea  Shield,  and  Sea 
Basing.  Sea  Strike  is  the  ability  to  project  offensive  firepower  for  a  sustained 
period  throughout  the  world.  Sea  Shield  ensures  defenses  are  continuously 
available  and  Sea  Basing  is  the  ability  to  operate  independently  on  the  seas  in 
support  of  joint  forces.  Sea  Power  21  requires  a  joint,  networked  force  fed  by 
superior  information  in  order  to  gain  a  tactical  advantage  (Clark  2002).  Under  the 
CNO’s  vision  of  optimizing  the  world’s  largest  maneuvering  area,  the  seas,  it  is 
essential  all  meteorological  events  be  accurately  predicted  to  allow  for  planners 
to  optimally  place  their  assets  to  exploit  the  operating  environment. 

The  ability  to  accurately  predict  the  path  and  intensity  of  hurricanes  will 
provide  Navy  decision  makers  with  superior  information  to  determine  the  best 
placement  for  naval  assets.  In  recent  years,  the  use  of  artificial  intelligence  has 
become  more  prevalent  during  the  current  time  of  decreasing  budgets  and 
manpower.  The  ability  to  model  events  that  mimic  real  life  scenarios  saves  the 
Department  of  Defense  (DoD)  millions  of  dollars  annually.  While  most  DoD 
ventures  into  artificial  intelligence  deal  with  war-gaming,  this  experiment  will  try 
and  use  a  type  of  artificial  intelligence,  adaptive  software,  to  improve  hurricane 
track  forecasting. 

The  objective  of  this  study  is  two  fold.  The  first  objective  is  to  create  a 
hurricane  forecast  that  will  produce  smaller  errors  than  a  consensus  forecast  of 
dynamical  models.  The  second  objective  is  to  prove  an  adaptive  system  is 
capable  of  providing  the  forecaster  an  objective  prediction  of  a  hurricane’s  path 
based  on  a  weighted  comparison  of  the  adaptive  system’s  decisions  and  ground 
truth. 
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B.  MOTIVATION 


The  ability  to  reduce  position  and  intensity  errors  for  hurricane  forecasting 
is  a  vital  issue  to  the  United  States  Navy.  During  the  2004  Atlantic  Hurricane 
Season,  hurricanes  caused  $45  billion  in  devastation.  The  ability  to  accurately 
predict  its  path  and  potential  landfall  region  far  enough  in  advance  to  save  lives 
and  infrastructure  is  of  severe  importance  to  the  Navy  and  civilian  officials.  The 
cost  to  sortie  the  Atlantic  Fleet  runs  into  the  millions  of  dollars.  Coastal 
evacuations  cost  local  economies  millions  in  lost  revenues  and  wages.  An 
accurate,  early  hurricane  track  forecast  is  essential  for  planners  to  minimize  the 
cost  of  these  storms  in  both  lives  and  damage. 


C.  BACKGROUND 

During  the  last  decade  numerical  track  prediction  models  have  drastically 
improved  and  have  become  indispensable  for  operational  forecasters.  This  has 
led  to  a  large  number  of  available  model  forecasts  that  has  actually  turned  into  a 
problem  for  forecasters.  The  large  spread  of  future  storm  positions  has  led  to 
numerous  studies  as  to  which  model  is  performing  the  best  (Weber  2003). 
Adaptive  Software,  when  applied  to  historical  model  data,  has  the  ability  to  make 
forecast  model  selections  in  real  time. 

1.  Multi-model  Ensemble  Forecasting 

Goerss  (2000)  has  shown  that  a  consensus  forecast,  created  by  the  linear 
combination  of  positions  from  three  dynamic  models,  outperformed  the  individual 
models.  To  analyze  to  the  Atlantic  hurricane  season,  Goerss  used  the  Navy 
Operational  Global  Atmospheric  Prediction  System  (NOGAPS;  Hogan  and 
Rosmond  1991),  the  United  Kingdom  Meteorological  Office  global  model  (UKMO; 
Cullen  1993),  and  the  Geophysical  Fluid  Dynamics  Laboratory  Hurricane 
Prediction  System  (GFDL;  Kurihara  et  al.  1993,  1995,  1998).  The  resulting  multi¬ 
model  ensemble  forecast  reduced  24,  48,  and  72  h  errors  by  16%,  20%,  and 
23%  respectively.  In  the  same  study,  Goerss  analyzed  the  1997  North  Pacific 

tropical  cyclones  using  the  NOGAPS,  UKMO  and  the  global  spectral  model 
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(GSM;  Kuma  1996).  The  ensemble  forecast  improved  forecast  errors  by  16%, 
13%,  and  12%  at  24,  48,  and  72  h.  The  NOGAPS  model  underperformed  the 
GSM  and  UKMO  models  during  the  1997  North  Pacific  Tropical  Cyclone  season 
and  raised  the  questions,  as  to  whether  an  ensemble  based  on  the  UKMO  and 
GSM  models  would  perform  better  than  the  three-model  ensemble.  While  not 
statistically  significant,  the  three-model  ensemble  consistently  outperformed  the 
two-model  ensemble. 

2.  Complex  Adaptive  Systems 

A  complex  adaptive  system  is  a  system  whose  properties  are  not 
fully  explained  by  an  understanding  of  its  component  parts.  Complex 
systems  consist  of  a  large  number  of  mutually  interacting  and  interwoven 
parts,  entities  or  agents  (Wikipedia  2005).  Examples  of  complex  adaptive 
systems  are  social  organizations,  economies,  traffic,  and  weather.  A  CAS 
operates  based  on  three  principles:  order  is  emergent  as  opposed  to 
predetermine,  the  system’s  history  is  irreversible,  and  the  system’s  future  is  often 
unpredictable  (Dooley  1996).  The  basic  elements  of  a  CAS  are  agents.  An 
agent  is  a  software  representation  of  a  decision-making  unit.  Agents  have 
unique  traits  or  personalities,  which  guide  their  performance  and  adaptability. 
Their  actions  are  based  on  internal  decision  rules  that  depend  on  imperfect  local 
information.  (Koritarov  2004) 

3.  The  El  Farol  Problem 

The  idea  for  this  experiment  was  based  on  ‘El  Farol  Bar’  problem 
introduced  in  1994  by  Brian  Arthur  (Edmonds  1998).  In  this  problem,  a  group  of 
agents  must  decide  whether  to  go  to  bar  each  Thursday  night  to  listen  to  live 
music.  All  agents  like  to  go  to  the  bar  unless  it  is  too  crowded,  that  is  if  more 
than  60%  of  the  agents  go.  Each  agent  is  armed  with  a  set  of  local  predictors  to 
help  them  determine  if  they  should  go  to  the  bar.  In  this  case,  a  predictor  might 
be  the  average  attendance  for  the  past  four  weeks,  the  best  performing  agent’s 
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focal  predictor,  or  simply  last  week’s  attendance  number.  Each  agent  is 
randomly  given  a  personality  of  extrovert,  introvert,  or  neutral.  The  measure  of 
how  an  agent  changes  is  determined  by  their  personality.  For  example,  if  the  bar 
were  over  crowded  one  night,  the  extrovert  would  decrease  its  fitness  by  -1. 
However  an  introvert  would  decrease  its  fitness  by  -3,  since  an  introvert 
personality  does  not  like  large  social  situations.  The  fitness  of  an  agent  is  a 
numerical  assessment  of  how  well  an  agent  is  performing.  Once  an  agent’s 
fitness  level  declines  to  a  predetermined  level  it  will  switch  out  predictors  in  an 
effort  to  become  fit.  The  results  of  the  ‘El  Farol  Bar’  problem  are  such  that  after 
an  initial  variability  above  and  below  the  60%  threshold,  the  attendance  levels  out 
at  60%.  This  is  a  classic  example  of  agents  being  able  to  transform  their 
composition  to  achieve  a  happy  outcome. 

4.  Hypothesis 

The  hypothesis  for  this  experiment  is  that  a  complex  adaptive  system, 
based  on  the  principles  from  the  El  Farol  problem,  will  create  a  ‘smart’  ensemble 
forecast  that  will  have  less  error  than  the  consensus  forecast  as  defined  by 
Goerss  (2000).  Chapter  II  will  discuss  the  data  used  and  the  system  design  of 
the  Tropical  Agent  Forecaster  (TAF)  program.  Included  in  this  will  be  a  break 
down  of  the  responsibilities  of  each  major  section  in  the  TAF.  The  analysis  of 
results  for  the  2004  Atlantic  hurricane  season  will  be  covered  in  Chapter  III.  This 
will  include  a  comparison  of  the  TAF  program  results  and  the  consensus  forecast 
results.  Chapter  IV  will  define  the  conclusions  and  future  work  possibilities  to 
further  enhance  the  complex  adaptive  system  approach  to  forecasting 
hurricanes. 
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II.  METHODOLOGY 


A.  DATA 

For  this  experiment,  the  Automated  Tropical  Cyclone  Forecast  Systems 
(ATCF,  Sampson  and  Schrader  2000)  output  files  for  the  2004  Atlantic  hurricane 
season  were  used  to  define  forecast  positions  from  the  suite  of  numerical  models 
used  at  the  National  Hurricane  Center  (NHC).  Specifically,  the  interpolated 
versions  of  the  previously  mentioned  NOGAPS  (NGPI),  UKMO  (UKMI),  GFDL 
(GFDI)  as  well  as  the  National  Center  for  Environmental  Prediction  Aviation 
global  model  [NCEP  AVN  (AVNI);  Surgi  et  al.  1998;  Lord  1991]  and  the 
Geophysical  Fluid  Dynamics  Laboratory  -  Navy  Model  (GFNI)  were  used.  Each 
storm  data  file  contains  all  forecasts,  in  12-hour  increments  from  00  -  120  h,  for 
the  different  models.  The  verifying  data,  in  6  h  positions,  are  the  best-track  files 
pulled  from  the  ATCF.  A  hurricane  is  identified  if  it  has  a  wind  speed  of  greater 
than  or  equal  to  25  knots.  In  order  to  provide  more  feedback  to  the  agents,  the 
individual  dynamic  models  were  interpolated  into  6  h  forecasts  using  a  simple 
linear  average.  The  starting  date-time-group  (DTG)  for  each  storm  is  determined 
by  finding  a  common  DTG  for  all  five  models.  The  ending  DTG,  for  this 
experiment,  is  set  by  the  last  available  forecast  from  the  NGPI  model. 

B.  ERROR  CALCULATIONS 

The  distance  in  nautical  miles  between  the  verifying  position  and  the 
forecast  position  defines  the  measure  of  how  well  the  system  performs.  The 
forecast  position  error  for  model  /,  Ej,  is  defined  to  be 

£,=V(C?  +  4),  (1) 

where  Ci  and  Ai  are  the  across  track  and  along  track  errors,  respectively  (Goerss 
2000).  For  this  experiment  we  are  not  concerned  with  whether  the  position  lags 
or  leads  the  best  track  position.  Speed  and  direction  are  not  part  of  determining 
how  well  a  predictor  or  forecast  performs. 
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C.  SYSTEM  DESIGN 

The  Tropical  Agent  Forecaster  program  is  written  using  the  object-oriented 
Java  programming  language.  The  basic  information  flow  of  the  program  (Figure 
1)  is  contained  in  three  levels,  defined  as  the  predictors,  the  agents,  and  the 
tropical  agent  forecaster. 


12  h  Information  Flow 


Tropical  Agent  Agent  |eve|  Predictor 

Forecaster  level  level 


Figure  1.  TAF  levels  and  Information  Flow 

1 .  Predictors  Level 

The  predictors  level  contains  all  the  possible  combinations  of  the  five 
dynamic  models.  Each  model  combination  is  a  separate  predictor  and  is 
available  for  each  forecast  time  (6,  12,  18,  24,  30,  36,  42,  48,  54,  60,  66,  72,  96, 
120  h).  The  two  functions  of  a  predictor  are  1)  get  a  historical  position  given  a 
DTG  and  a  forecast  time  and  2)  when  directed,  get  a  forecast  position  for  a  future 
DTG  and  forecast  time.  An  example  of  a  predictor  is  the  UKMI  NGPI  AVNI 
predictor.  This  predictor  is  a  linear  combination  of  positions  from  each  of  the 
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specified  models  at  a  given  forecast  time.  For  this  predictor  to  be  created,  all 
three  models  must  be  available  for  the  given  DTG  and  forecast  time.  If  one 
model  is  not  available,  the  position  is  set  to  latitude  0.0,  longitude  0.0.  This  will 
result  in  high  error  numbers  and  the  combination  will  not  be  used  for  the  current 
DTG.  Subsequently,  there  are  several  other  predictors  that  can  come  from  this 
combination,  such  as  an  AVNI  NGPI  predictor,  a  NGPI  UKMI  predictor,  an  AVNI 
predictor,  etc.  All  possible  combinations  of  predictors  are  evaluated  in  6  h 
increments  and  for  each  possible  forecast  time. 

2.  Agent  Level 

The  building  blocks  of  any  complex  adaptive  system  are  the  agents.  From 
a  programming  point  of  view,  agents  are  active  objects  that  have  been  defined  to 
simulate  parts  of  a  model  (Amin  and  Ballard  2000).  Agents  have  the  ability  to 
evolve  in  response  to  their  environment.  In  our  program,  agents  are  random 
given  a  set  of  eight  predictors  for  each  possible  forecast  time.  That  is  to  say  a 
set  of  eight  6  hour  predictors  is  randomly  assigned,  a  set  of  eight  12-hourour 
predictors  is  randomly  assigned,  etc.,  until  all  forecast  times  have  been  included. 
The  predictor  sets  for  6  and  12  hours  will  not  be  the  same.  In  the  end,  each 
agent  will  have  fourteen  sets  of  eight  predictors. 

The  item  that  differentiates  one  agent’s  behavior  from  the  other  is  their 
personality.  In  our  program,  an  agent  is  either  tolerant  or  intolerant  of  error. 
Each  agent  will  weight  its  local  predictors  based  on  their  personality.  A  tolerant 
agent  will  react  slower  to  under  performing  predictors,  while  an  intolerant  agent 
will  want  to  quickly  swap  out  predictors  that  are  underperforming.  An  example  of 
how  error  tolerance  differs  between  the  two  personalities  for  a  12-hour  prediction, 
is  provided  in  figure  2.  The  effect  is  to  place  a  target  over  the  current  position  of 
the  hurricane.  The  agent,  for  12-hour  predictors,  will  look  back  12  hours  and  get 
the  12-hour  forecast  for  each  local  predictor.  This  12-hour  forecast  is  valid  for 
the  current  DTG.  The  intolerant  agent  will  assign  a  +4,  0,  -4,  -8  weight  to  a  local 
predictor  if  its  12-hour  forecast  position  falls  within  0-30  nm,  31  -  45  nm,  46  - 
60  nm,  >  60  respectively.  A  tolerant  agent  will  assign  a  +4,  0,  -4,  -8  weight  to  a 
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local  predictor  if  its  12-hour  forecast  position  falls  within  0-42  nm,  43  -  60  nm, 
61  -  90  nm,  >  90nm  respectively.  The  12-hour  radius  for  the  intolerant  agent 
was  set  just  below  the  12-hour  total  average  error  of  the  five  models  during  the 
season. 


12  h  Agent  Personality  Comparison 


Intolerant  agent 


Tolerant  agent 


Figure  2.  A  12-hour  Agent  Personality  Comparison 

Agents  have  the  ability  to  swap  out  predictors  once  the  predictor’s  weight 
has  fallen  below  a  designated  fitness  value.  For  this  experiment,  the  fitness 
value  has  been  set  at  -12.  After  each  iteration  through  the  forecast  cycle,  the 
agent  checks  the  local  weights  of  its  predictors.  If  a  predictor  has  a  weight  that  is 
below  the  fitness  value,  the  agent  will  request  a  new  predictor.  This  new 
predictor  is  guaranteed  not  to  be  the  same  predictor  that  was  just  swapped  out. 
This  new  predictor  comes  into  the  agent’s  set  of  predictors  with  a  weight  of  0. 

Once  an  agent  has  assessed  the  performance  of  each  set  of  its 
predictors,  the  agent  must  designate  its  best  performing  predictor.  The  best 
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performing  predictor  is  the  one  with  the  highest  weight.  If  more  than  one 
predictor  has  the  same  weight,  one  is  chosen  randomly  from  the  evenly  weighted 
predictors.  The  best  performing  predictor  for  each  forecast  time  per  agent  is 
made  available  to  tropical  agent  forecaster  level.  For  each  DTG  an  agent  will 
present  14  predictors,  one  for  each  forecast  time,  to  the  tropical  agent  forecaster. 

3.  Tropical  Agent  Forecaster  Level 

The  tropical  agent  forecaster  (TAF)  is  responsible  for  generating  the 
official  forecast  for  the  system.  The  TAF  polls  the  different  agents  for  their  best 
predictors  for  each  forecast  time.  Much  like  how  it  is  done  within  each  agent,  the 
TAF  selects  the  predictors  for  each  forecast  time  with  the  highest  weight.  More 
often  than  not,  there  is  more  than  one  predictor  with  the  same  weight.  This  is 
where  the  TAF  predictor  selection  differs  from  the  agents.  The  TAF  does  not 
randomly  pick  one  predictor,  but  rather  it  simply  eliminates  duplicate  predictors. 
What  is  left  is  a  set  of  equally  weighted,  unique  predictors.  The  TAF  then  gets  a 
forecast  position  for  each  predictor  in  the  set.  To  output  only  one  forecast 
position,  the  TAF  performs  a  linear  average  of  the  highest  weighted  forecast 
predictors. 


4.  Program  Information  Flow 

Upon  program  initialization,  the  user  selects  the  storm  to  analyze.  Once 
the  storm  has  been  selected,  the  ATCF  data  fields  for  that  storm  are  loaded  and 
the  model  data  is  interpolated  into  6  h  increments.  After  data  have  been 
ingested,  the  agents  are  created.  Each  agent  is  randomly  given  a  personality 
and  a  set  of  8  predictors  for  each  forecast  time.  Now  that  each  agent  has  all  the 
information  it  needs,  it  begins  processing  the  data  fields. 

The  TAF,  like  any  other  CAS,  needs  a  history  in  order  learn  and  make 
forecasts.  Since  at  the  start  of  storm  there  is  no  history  available,  the  program 
must  wait  6  hours  until  it  can  look  back  6  hours  to  assess  performance.  After  6 
hours,  the  agents  will  process  their  set  of  6-hour  predictors.  A  6-  hour  predictor 
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will  look  back  6  hours  and  get  its  6-hour  forecast.  This  forecast  will  be  compared 
with  the  current  position  of  the  hurricane  and  the  error  will  be  calculated.  Once 
the  errors  have  been  calculated  for  each  of  the  6-hour  predictors,  each  agent  will 
adjust  their  local  predictor  weights  based  on  performance.  The  agents  then 
check  their  fitness  level  and  if  it  is  below  a  threshold  of  -12,  it  will  swap  out  its 
worst  performing  predictor.  Each  agent  passes  the  best  predictor  to  the  tropical 
agent  forecaster.  Once  the  tropical  agent  forecaster  has  each  agent’s  6-hour 
prediction,  it  finds  the  highest  weighted  predictors  and  eliminates  duplicates. 
With  this  final  set  of  best  predictors,  the  agent  gets  the  6-hour  forecast  position 
from  each  predictor.  These  positions  are  averaged  to  produce  the  final  6-hour 
forecast  position.  The  tropical  agent  forecaster  calculates  the  forecast  error  from 
the  best  track  data  and  writes  this  information  to  a  forecast  file. 

This  process  is  repeated  until  it  reaches  the  ending  DTG.  On  the  second 
time  through  the  loop,  the  program  is  now  12  hours  into  its  analysis.  The  6-hour 
predictors  are  processed  again  and  now  the  12  hours  begin  to  be  processed 
(Figure  3).  The  process  of  getting  a  12-hour  forecast  involves  going  back  in  time 
to  process  the  predictors,  assigning  weights  to  predictors,  and  generating  a 
forecast  based  on  the  highest  weighted  predictors.  The  end  result  after  12  hours 
is  both  a  6-hour  and  a  12-hour  forecast.  Every  6-hours  another  set  of  predictors 
is  introduced  into  the  system  and  another  forecast  is  added.  The  program  will 
generate  forecasts  in  6-hour  increments  up  to  72  hours  and  then  it  generates  96- 
and  120-  hour  forecasts.  What  makes  this  forecast  position  unique  to  any  other 
multi-model  ensemble  is  the  different  models  and  model  combinations  used  to 
generate  the  position. 


10 


time 


Figure  3.  A  12  hour  forecast  example  -  the  stars  represent  best  track 
positions,  the  circles  with  x  inside  indicate  average  positions 
between  models.  The  four  pointed  star  is  the  final  forecast  position 
after  averaging  the  forecast  positions  of  the  highest  weighted 

predictors. 


A  final  forecast  position  that  is  based  on  an  average  of  the  AVNI  forecast, 
the  UKMI  NGPI  forecast,  the  AVNI  NGPI  forecast,  and  the  AVNI  UKMI  NGPI 
forecast  is  presented  in  Figure  3.  Below  is  a  sample  of  the  typical  output  for  a 
12-hour  forecast  position. 

20048212, 

12, 

32.2,  77.9, 

19.377499622275813, 
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AVNI  UKMI  GFDI  12  hour  predictor,  AVNI  UKMI  NGPI  GFDI  12 
hour  predictor,  AVNI  12  hour  predictor,  GFDI  12  hour  predictor, 

AVNI  and  GFDI  12  hour  predictor,  AVNI  UKMI  NGPI  12  hour 
predictor,  AVNI  and  NGPI  12  hour  predictor, 

The  first  line  is  the  current  DTG.  In  this  case  it  is  August  2,  2004  at  12  Z. 
The  second  line  indicates  this  is  a  12-hour  forecast,  and  the  third  line  gives  the 
forecast  position  for  August  3,  2004  at  00Z.  The  fourth  line  indicates  the  error 
associated  with  the  forecast.  The  last  group  of  lines  shows  all  the  models/  model 
combinations  that  went  into  generating  the  final  forecast  position. 


D.  CONSENSUS  FORECASTS 

The  goal  of  this  experiment  is  for  the  TAF  program’s  forecasts  errors  to  be 
significantly  less  than  those  of  the  consensus  forecast  (CONU).  The  CONU 
forecast  is  a  linear  combination  of  individual  model  forecast  positions.  The 
CONU  forecast  used  for  comparison  in  this  experiment  is  comprised  of  the  AVNI, 
the  GFDI,  the  GFNI,  NGPI,  and  UKMI  models.  Goerss  (2000)  showed  that  a 
CONU  forecast  containing  three  models  (UKMO,  GFDL,  NOGAPS)  outperformed 
individual  models  throughout  the  course  of  the  1995-96  Atlantic  hurricane 
seasons.  In  a  study  of  the  1997  North  Pacific  tropical  cyclones,  the  three-model 
consensus  forecast  again  beat  the  individual  model  forecasts.  In  this  case,  two 
of  the  three  individual  models  significantly  out  performed  the  third  model.  This 
led  to  the  question  of  whether  a  two-model  consensus  forecast  would  produce 
better  results.  Despite  the  better  individual  performance  of  the  two  models,  the 
three-model  consensus  forecast  consistently  outperformed,  but  not  statistically 
significant,  the  two-model  consensus  forecast  (Goerss  2000).  The  determination 
was  made  that  a  consensus  forecast  should  contain  a  minimum  of  three 
numerical  models. 
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III.  RESULTS  AND  ANALYSIS 


A.  RESULTS 

A  homogeneous  comparison  of  the  hurricane  track  performance  of  the 
NGPI,  GFDI,  GFNI,  AVNI,  UKMI,  CONU,  and  TAF  is  presented  in  Table  1  for  the 
2004  Atlantic  hurricane  season. 


Table  1.  Total  errors  (nm)  for  2004  Atlantic  hurricane  season 


12h 

24h 

36h 

48h 

72h 

96h 

120h 

NGPI 

39.8 

73.2 

101.1 

137.6 

219.2 

271.0 

375.8 

GFNI 

41.6 

77.5 

107.5 

156.8 

209.3 

228.6 

416.1 

AVNI 

38.8 

69.3 

98.6 

147.8 

180.1 

171.7 

291.9 

UKMI 

40.9 

68.9 

90.4 

124.6 

164.4 

250.2 

234.4 

GFDI 

34.8 

63.2 

91.0 

140.0 

169.4 

236.1 

279.6 

CONU 

33.9 

61.4 

82.8 

122.5 

152.5 

169.3 

270.0 

TAF 

34.8 

59.6 

87.6 

137.9 

166.6 

190.9 

249.4 

CASES 

186 

160 

143 

113 

66 

38 

20 

Hurricane  forecast  errors  for  the  five  models  and  the  consensus  ensemble 
were  gathered  using  software  from  the  ATCF  system.  The  TAF  forecast  errors 
were  output  from  the  program  described  in  chapter  II.  A  Student  t-test  (Wilks 
1993)  was  performed  to  assess  the  statistical  significance  between  the  errors 
associated  with  the  TAF  and  CONU  forecasts.  At  12  and  24  hours,  the 
differences  between  the  TAF  forecasts  and  the  CONU  forecasts  are  not 
statistically  significant.  The  TAF  program  performed  significantly  worse  at  36,  48, 
and  72  hours.  At  96  hours,  the  TAF  program  was  outperformed  by  the  CONU 
right  at  the  95%  level,  while  at  120  hours  the  TAF  program  performance  was 
significantly  better  than  the  CONU.  The  remaining  analysis  will  focus  on  72  - 
120  hours  since  the  ensemble  forecasts  are  most  beneficial  at  the  longer 
forecast  intervals  where  the  spread  between  models  tends  to  increase. 

Based  on  Table  1,  it  was  necessary  to  examine  the  individual  forecasts 
preferred  by  the  TAF  to  answer  why  its  forecast  errors  were  greater  than  the 
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CONU  errors.  Initially,  the  number  of  times  the  TAF  program  gave  a  better 
forecast  compared  to  the  CONU  was  identified  (Table  2).  At  72  hours,  the  TAF 
program  gave  a  better  forecast  than  the  CONU  44%  of  the  time.  This 
percentage  increased  at  96  and  120  hours  to  54%  and  64%  respectively. 
Because  of  the  TAF  design  and  use  of  predictors  by  the  agents,  it  is  possible  that 
a  TAF  forecast  may  be  based  on  a  single  model  or  a  combination  of  models.  For 
example,  a  96-hour  forecast  may  be  the  96-hour  forecast  for  the  AVNI.  This 
would  occur  when  the  AVNI  has  been  performing  accurately  in  the  past  positions 
such  that  it  is  be  assigned  a  high  weight. 

Based  on  the  results  in  Table  1,  the  number  of  TAF  forecasts  that  are 
based  on  a  single  model  is  defined  and  compared  with  the  number  of  times  TAF 
forecast  positions  are  based  on  combinations  of  model  positions  (e.g.  NGPI 
AVNI).  Identification  of  TAF  forecasts  based  on  a  single  model  revealed  that  the 
differences  between  the  CONU  and  single  model-based  TAF  forecasts  were 
large.  Using  the  number  of  times  the  TAF  program  selected  only  one  model,  the 
single  model  forecast  positions  data  were  removed  from  both  the  TAF  and 
CONU  output  and  the  average  errors  were  recalculated  on  the  new 
homogeneous  set  (Table  3).  Both  forecasts  improved  at  72  hours,  however  the 
improvement  of  the  TAF  program  was  significantly  better.  At  96  hours,  the  TAF 
program  went  from  performing  significantly  worse  to  outperforming  the  CONU, 
however  not  at  a  significant  level. 
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Table  2.  Comparison  of  time  TAF  program  was  better  or  worse  than  CONU  for 
72-120  h.  Also  indicated  is  the  number  of  times  individual  models 
were  chosen  by  the  TAF  program. 


BETTER 

WORSE 

ALL 

SINGLE  MOD 

EL  COMBO 

ALL 

SINGLE  MOD 

EL  COMBO 

29 

1 

28 

36 

4 

32 

21 

1 

20 

17 

4 

13 

13 

2 

11 

7 

3 

4 

The  improvement  at  96  hours  for  the  TAF  program  after  removing  the 
single  model  selections  is  significant.  Both  the  models  performed  worse  at  120 
hours.  The  degradation  of  120-hour  error  is  due  to  Hurricane  Frances  such  that 
the  TAF  program  rated  the  AVNI  120  hour  predictor  as  the  best  predictor  and 
used  it  for  every  120-hour  forecast.  Early  on  in  the  lifetime  of  Frances,  this  AVNI 
120  forecast  positions  vastly  outperformed  the  CONU,  but  for  the  final  three 
forecasts,  the  CONU  greatly  outperformed  the  TAF’s  selection  of  AVNI  120. 


Table  3.  Comparison  of  average  errors  for  72,  96, 120  hours  with  single 
models  included  and  after  removing  single  model  selections 


SINGLE  MODI 

ELS  INCLUDED 

72  h 

96  h 

120  h 

CONU 

152.5 

169.3 

270.0 

TAF 

166.6 

190.9 

249.4 

CASES 

65 

SINGLE  MODI 

38 

ELS  REMOVED 

20 

72  h 

96  h 

120  h 

CONU 

143.4 

177.2 

390.0 

TAF 

150.1 

173.1 

350.1 

CASES 

60 

33 

15 

After  removing  the  single  models,  the  standard  deviations  were  greatly 
reduced  for  both  the  CONU  and  TAF  (Table  4).  The  decrease  in  standard 
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deviation  was  most  significant  at  72  and  96  hours,  while  the  increase  at  120 
hours  was  somewhat  related  to  the  small  sample  size. 

These  results  led  to  the  conclusion  that  single  model  performance  will 
either  greatly  outperform  or  under  perform  a  consensus  model  forecast  and  will 
lead  to  a  higher  standard  deviation  for  forecast  errors.  Therefore,  the  key  is  to 
recognize  when  a  single  model  is  performing  well.  The  TAF  approach  uses  past 
model  performance  as  a  predictor  to  define  when  an  individual  model  is 
performing  well.  Unfortunately,  results  in  Tables  2  and  3  suggest  that  past  model 
performance  is  not  always  related  to  future  performance.  Therefore,  the  TAF 
system  may  choose  a  single  model  forecast  more  often  than  a  combination  of 
models. 


Table  4.  Comparison  of  standard  deviations  (nm)  with  single  models  included 

and  without  single  models  included. 


STANDARD  DEVIATIONS  (NM)  WITH 


SINGLE  MOI 

DELS  INCLUDED 

72  h 

96  h 

120  h 

CONU 

67 

113 

131 

TAF 

83 

112 

137 

CASES  65  38  20 


STANDARD  DEVIATIONS  (NM)  WITH 


SINGLE  MOI 

DELS  REMOVE 

D 

72  h 

96  h 

120  h 

CONU 

67 

106 

89 

TAF 

78 

96 

116 

CASES  60  33  15 


Based  on  the  above  analysis,  it  was  decided  to  investigate  the  impact  on 
the  TAF  forecast  that  results  from  an  increased  number  of  predictors.  The 
number  of  times  the  TAF  program  made  a  forecast  using  one  predictor,  two 
predictors,  or  three  or  more  predictors  was  defined  (Table  5).  In  this  case,  a 
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single  predictor  is  made  up  from  a  combination  of  more  than  one  model.  The 
TAF  program  selected  a  single  predictor  as  it  forecast  solution  18  times  between 
72  and  120  hours.  A  two-predictor  forecast  is  when  the  final  forecast  is  made  up 
two  forecast  predictors  averaged  together.  It  follows  that  a  forecast  based  on 
three  or  more  predictors  uses  an  average  of  three  or  more  forecast  positions. 
When  examining  the  average  error  for  each  of  these  three  categories,  the  three 
or  more  predictor  forecast  for  the  TAF  program  was  lower  that  the  CONU  model 
at  each  of  the  72,  96  and  120  hour  forecast  intervals. 


Table  5.  A  homogeneous  comparison  of  TAF  and  CONU  forecast  errors  when 
the  TAF  program  selected  one  predictor,  two  predictors  or  three 
predictors  to  generate  the  forecast  position 


1  PREDICTOR  2  PREDICTORS  3  OR  MORE  PREDICTORS 


CONU 

TAF 

CONU 

TAF 

CONU 

TAF 

72  h 

153.4 

168 

83.7 

134.2 

149.5 

143.7 

96  h 

180.8 

182.4 

152.9 

180.8 

178.3 

161.1 

120  h 

NO  CASES 

NO  CASES 

NO  CASES 

NO  CASES 

390 

350.1 

CASES 

18 

18 

5 

5 

37 

37 

The  average  error  for  the  TAF  program  decreased  with  the  greater 
number  of  predictors  averaged  to  create  the  forecast.  For  120  hours,  all 
forecasts  were  made  with  three  or  more  predictors.  At  72  and  96  hours,  the 
selection  of  two  predictors  occurred  only  8%  of  the  time.  Based  on  the 
information  in  table  5,  the  TAF  program  was  modified  to  force  at  least  three 
predictors  be  averaged  to  create  the  forecast  predictions.  This  change  only 
affected  the  tropical  agent  forecaster  level  of  the  program.  It  did  not  change  the 
manner  in  which  the  agents  weighed  each  predictor.  The  highest  weighted 
predictor  was  always  one  of  the  predictors  used  in  the  final  forecast.  The  tropical 
agent  would  look  at  the  next  lowest  weighted  predictors  provided  by  the  agents 
and  include  them  in  the  final  forecast  prediction. 

Examining  the  72  -  120  hours  average  errors  for  the  modified  TAF 

program,  showed  improved  performance  versus  the  CONU  model.  The  average 
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forecast  error  for  the  modified  TAF  decreased  from  166.6  nm,  190.9  nm,  and 
249.4  nm  (see  Table  1)  down  to  148.4  nm,  185.5  nm,  and  237.3  nm  respectively 
for  72,  96,  and  120  hours.  A  t-test  was  performed  to  check  the  significance,  at  a 
95%  confidence  level,  of  these  new  average  forecast  errors  versus  the  CONU 
model.  At  72  hours,  the  TAF  average  error  went  from  being  significantly  larger 
than  CONU  to  smaller  than  CONU.  For  96  hours,  the  results  were  the  same  as 
before,  with  a  marginally  significant  difference  that  favored  the  CONU  over  the 
TAF,  however  the  difference  between  the  average  errors  was  closer.  The 
modified  TAF  remained  significantly  better  than  the  CONU  at  120  hours. 
Standard  deviations  improved  slightly  at  72  and  96  hours,  however  it  increased 
slightly  at  120  hours. 

B.  CASE  STUDY 

Hurricane  Ivan  is  presented  as  a  case  study  to  highlight  an  example  of 
when  the  TAF  and  TAF-3  programs  provided  a  positive  result  when  compared  to 
the  CONU  forecast.  The  complete  set  of  forecast  tracks  for  CONU,  TAF,  and 
TAF-3  for  Hurricane  Ivan  (Figures  4,  5,  and  6  respectively)  define  a  right 
(eastward)  bias  throughout  the  life  of  the  storm.  This  is  an  indication  that  the 
majority  of  models  are  forecasting  positions  to  the  right  of  the  actual  hurricane 
track.  It  is  not  possible  to  eliminate  the  right  side  bias  with  the  current 
configuration  of  the  TAF  and  TAF-3  programs.  The  goal  is  to  reduce  this  bias  by 
selecting  a  predictor  that  will  not  include  the  largest  error  models. 
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Figure  4.  Hurricane  Ivan  complete  set  of  CONU  forecasts.  The  black 
circles  represent  the  best  track  positions  in  6-h  intervals.  Forecast 
positions  are  defined  by  alternating  colors  at  12-h  intervals  to  72 
hours,  then  24-h  intervals  to  120  hours. 


Figure  5.  As  in  Figure  4,  except  for  the  TAF  forecasts. 


Hurricane  Ivan  (2004)  TAF 


Hurricane  Ivan  (2004)  CONU 
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Figure  6.  As  in  Figure  4,  except  for  the  TAF-3  forecasts 


The  96  and  120-hour  errors  for  Hurricane  Ivan  indicate  that  early  on  in  the 
storm  all  three  forecasts  are  performing  similarly  (Figures  7  and  8).  The  TAF  and 
TAF-3  forecasts  errors  are  slightly  less  than  the  consensus  forecast  error  at  1800 
UTC  on  September  8,  2004  (highlighted  with  the  blue  rectangle).  The  green 
rectangles  (Figure  7)  highlight  the  forecast  errors  at  0000  UTC  and  1200  UTC  on 
September  1 1 ,  2004  and  show  that  the  TAF  and  TAF-3  96-hour  forecast  errors 
(Figure  7)  are  initially  larger  than  the  CONU  but  the  trend  is  reversed  just  twelve 
hours  later.  At  120  hours  (Figure  8)  a  similar  trend  is  noticed  such  that  the 
performance  of  the  TAF  and  TAF-3  become  significantly  better  than  the 
performance  of  the  CONU. 
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Figure  7.  CONU,  TAF  and  TAF-3  96  h  forecast  errors  for  Hurricane  Ivan 


120  h  Average  Errors  (Ivan  09) 
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Figure  8.  CONU,  TAF  and  TAF-3  120  h  forecast  errors  for  Hurricane  Ivan 


When  inspecting  the  forecast  tracks  that  correspond  to  the  highlighted 

areas,  the  type  of  performance  characteristics  discussed  above  become  evident. 

At  1800  UTC  on  September  8,  2004,  all  three  forecasts  are  performing  in  a 

similar  fashion  as  Hurricane  Ivan  heads  into  the  Caribbean  Sea  (Figure  9).  At 
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0000  UTC  on  September  1 1 ,  all  three  forecasts  for  96  and  120  hours  are  moving 
the  storm  at  a  similar  speed  and  there  is  an  insignificant  error  difference  favoring 
the  CONU  (Figure  10).  Stepping  forward  to  1200  UTC  on  September  11  (Figure 
11),  both  the  96  and  120-hour  forecasts  for  the  TAF  and  TAF-3  are  significantly 
outperforming  the  CONU  forecasts.  The  CONU  has  accelerated  the  storm 
northward  much  quicker  than  the  TAF  and  TAF-3.  This  is  caused  by  the 
requirement  that  the  CONU  contain  all  models  in  creating  its  forecast  position.  In 
this  case,  the  NGPI  120  hour  error  was  over  1300nm.  This  drastically  affected 
the  final  position  for  the  CONU.  The  TAF  and  TAF-3  did  not  accelerate  the  storm 
since  the  NGPI  was  not  included  in  any  of  the  predictors  used  to  make  its  96  and 
120-hour  forecast.  Therefore,  this  example  illustrated  the  ability  of  the  TAF 
system  to  recognize  that  a  model  is  performing  poorly  and  removes  it  as  a 
predictor  for  future  positions. 


Figure  9.  As  in  Figure  4,  except  for  the  2004090818  forecast  tracks  for 

CONU,  TAF  and  TAF-3 
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CONU:  green,  TAF:  red,  TAF— 3:  blue 


Figure  10.  As  in  Figure  4,  except  for  the  2004091100  forecast  tracks  for 

CONU,  TAF,  and  TAF-3 


Figure  11.  As  in  Figure  4,  except  for  the  2004091112  forecast  tracks  for 

CONU,  TAF,  and  TAF-3 
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IV.  CONCLUSIONS 


A.  SUMMARY 

A  complex  adaptive  system  was  created  to  forecast  hurricane  track 
position  for  the  2004  Atlantic  hurricane  season.  The  TAF  program  used 
intelligent  agents  to  create  a  ‘smart’  ensemble  based  on  the  historical 
performance  of  both  individual  and  combinations  of  dynamic  models.  In  the 
initial  application  of  the  TAF  system,  an  unconstrained  application  of  the  TAF 
was  used  such  that  the  absolute  set  of  highest  weighted  set  of  predictors  was 
used  to  produce  a  forecast  position.  Based  on  the  TAF  design,  a  predictor  may 
be  comprised  of  a  single  model  or  a  combination  of  models.  A  single  model  may 
be  the  highest  weighted  predictor  when  it  has  been  consistently  producing  highly 
accurate  forecasts  over  the  past  lifetimes  of  the  hurricane.  Results  using  the 
unconstrained  system  indicated  that  the  TAF  forecast  were  only  statistically 
better  than  a  pure  linear  combination  of  all  input  models  at  120  hours. 

These  results  were  examined  to  identify  whether  the  use  of  single-model 
predictors  caused  the  TAF  to  have  increased  errors.  Indeed,  removal  of  single¬ 
model  based  forecast  improved  the  TAF  forecast  with  respect  to  the  linear 
average  of  all  models.  Furthermore,  the  standard  deviation  of  forecast  errors 
was  greatly  reduced  when  single-model  forecasts  were  removed.  This  is 
anticipated  since  the  remaining  predictors  are  based  on  a  combination  of 
forecast  models. 

The  final  analysis  investigated  the  impact  on  forecast  accuracy  from  using 
increased  numbers  of  combination-based  predictors.  The  TAF  program,  when 
forced  to  use  three  or  more  predictors,  consistently  outperformed  the  CONU 
forecasts  for  72  hours,  but  the  difference  was  not  statistically  significant.  At  96 
hours,  the  CONU  still  out  performed  the  TAF  program,  however  the  average 
error  difference  decreased.  There  is  a  statistically  significant  performance 
improvement  at  120  hours. 
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However,  the  ability  to  use  a  CAS  to  predict  hurricane  tracks  has  validity. 
The  application  of  an  unconstrained  system  may  need  further  examination.  One 
note  of  caution  is  that  the  continual  average  of  forecast  predictors  into  one  final 
forecast  position  decreases  the  importance  of  the  agents.  In  an  ideal  CAS 
application,  one  agent’s  prediction  should  provide  the  answer,  not  a  combination 
of  several  agent  predictions.  However,  this  may  be  adversely  impacted  by  the 
fact  that,  with  respect  to  hurricane  track  forecasting,  past  model  performance  is 
not  significantly  correlated  to  future  performance. 


B.  FUTURE  WORK 

There  are  a  number  of  ways  to  implement  a  complex  adaptive  system  to 
forecast  hurricane  tracks.  The  current  TAF  system  is  a  first  step  in  creating  an 
agent  based  forecasting  system.  Based  on  this  approach,  the  following 
recommendations  are  provided  to  improve  the  application  of  a  complex  adaptive 
system  to  hurricane  track  forecasting. 

1 .  Remove  Agent  Restrictions 

The  agents  in  the  TAF  system  are  currently  given  a  set  of  eight  predictors 
segmented  into  fourteen  forecast  times.  The  next  generation  of  TAF  should 
remove  the  segmentation  of  the  forecast  time  slots  [i.e.  6,  12,  18...].  An  agent 
should  be  given  a  total  of  eight  predictors  for  fourteen  forecast  periods.  This 
would  enable  the  best  6-hour  predictor  to  compete  as  the  best  48-hour  predictor 
and  enable  the  agents  to  make  decisions  based  on  both  the  best  currently 
performing  predictor  and  the  best  historical  performing  predictor.  The  process  of 
looking  back  96  hours  to  get  the  best  prediction  to  forecast  out  96  hours  would 
be  reduced.  This  will  hopefully  lead  to  a  more  accurate  prediction  of  the  future 
forecast. 
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2.  Creating  History 

For  a  complex  adaptive  system  to  work  it  must  have  an  accurate  history. 
For  example,  the  current  TAF  system  must  wait  96  hours  into  the  storm’s  life  in 
order  to  produce  a  96-hour  forecast.  This  reduces  the  number  of  long  range 
forecasts  to  an  unacceptably  low  level,  particularly  when  forecasting  in  the 
Atlantic  Ocean.  It  might  prove  that  simply  removing  the  agent  restrictions  noted 
above  will  be  sufficient  in  providing  more  accurate  long-range  forecasts.  The 
addition  of  climatology  data  into  the  system  might  prove  useful  in  creating  a 
history  that  can  be  used  to  forecast  longer  ranges  more  accurately  from  the  start 
of  the  storm. 

3.  Pacific  Tropical  Cyclone  Analysis 

The  TAF  program  should  be  used  to  analyze  past  Western  North  Pacific 
tropical  cyclone  seasons.  The  tropical  cyclones  in  the  Western  North  Pacific 
Ocean  usually  have  longer  tracks  than  those  in  the  Atlantic  Ocean.  Additional 
TAF  output  data  collected  for  the  72,  96,  and  120  hour  forecast  periods  would 
validate  the  Atlantic  Ocean  data.  Simple  data  ingest  modification  that  enables 
the  TAF  system  to  recognize  which  basin  the  storm  is  in  would  make  this 
analysis  possible. 
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